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SYNTHETIC LEADER PEPTIDE SEQUENCES 
FIELD OF INVENTION 

5 The present invention relates to novel synthetic leader peptide sequences for secreting 
polypeptides in yeast. 

BACKGROUND OF THE INVENTION 

10 

Yeast organisms produce a number of proteins which are synthesized intracellular^, 
but which have a function outside the cell. Such extracellular proteins are referred to as 
secr eted proteins. These secreted proteins are expressed initially inside the cell in a 
precursor or a pre-protein form containing a pre-peptide sequence ensuring effective 

15 direction of the expressed product (into the secretory pathway of the cell) across the 
membrane of the endoplasmic reticulum (ER). The pre-sequence, normally named a 
signal peptide, is generally cleaved off from the desired product during translocation. 
Once entered in the secretory pathway, the protein is transported to the Golgi 
apparatus. From the Goigi the protein can follow different routes that lead to 

20 compartments such as the cell vacuole or the cell membrane, or it can be routed out of 
the cell to be secreted to the external medium (Pfeffer, S.R. and Rothman, J.E. 
Ann.Rev.Biochem. 56 (1987) 829-852). 

Several approaches have been suggested for the expression and secretion in yeast of 
25 proteins heterologous to yeast European published patent application No. 88 632 
describes a process by which proteins heterologous to yeast are expressed, 
processed and secreted by transforming a yeast organism with an expression vehicle 
harbouring DNA encoding the desired protein and a signal peptide, preparing a culture 
of the transformed organism, growing the culture and recovering the protein from the 
30 culture medium. The signal peptide may be the signal peptide of the desired protein 
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itself, a heterologous signal peptide or a hybrid of native and heterologous signal 
peptides. 

A problem encountered with the use of signal peptides heterologous to yeast might be 
5 that the heterologous signal peptide does not ensure efficient translocation and/or 
cleavage ot the precursor polypeptide after the signal peptide. 

The Saccharomvces cerevisiae MFa1 (a-factor) is synthesized as a pre-pro form of 
165 amino acids comprising signal- or pre-peptide of 19 amino acids followed by a 
10 "leader" or pro-peptide of 64 amino acids, encompassing three N-Iinked glycosylation 
sites followed by (LysArg((Asp/Glu)Ala)2_3a-factor)4 (Kurjan, J. and Herskowitz, I. Cell 

30 (1982) 933-943). The signal-leader part of the pre-pro MFa1 has been widely 
employed to obtain synthesis and secretion of heterologous proteins in ?L cerevisiae . 

15 Use of signal/leader peptides homologous to yeast is known from La. US patent 
specification No. 4,546,082, European published patent applications Nos. 116 201, 
123 294, 123 544, 163 529 and 123 289 and DK patent application No. 3614/83. 

In EP 123 289 utilization of the £L cerevisiae a-factor precursor is described whereas 
20 WO 84/01 1 53 indicates utilization of the £L cerevisiae invertase signal peptide and DK 
3614/83 utilization of the JL cerevisiae PH05 signal peptide for secretion of foreign 
proteins. 

US patent specification No. 4,546,082, EP 16 201, 123 294, 123 544 and 163 529 
25 describe processes by which the a-factor signal-leader from S,, cerevisiae (MFa1 or 
MFa2) is utilized in the secretion process of expressed heterologous proteins in yeast 
By fusing a DNA sequence encoding the fL cerevisiae MFa1 signal/leader sequence 
at the 5* end of the gene for the desired protein secretion and processing of the desired 
protein was demonstrated. 



30 
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EP 206 783 discloses a system for the secretion of polypeptides from £L cerevisiae 
using an a-factor leader sequence which has been truncated to eliminate the four a= 
factor units present on the native leader sequence so as to leave the leader peptide 
itself fused to a heterologous polypeptide via the a-factor processing site 
5 LysArgGIuAlaGIuAla. This construction is indicated to lead to an efficient processing of 
smaller peptides (less than 50 amino acids). For the secretion and processing of larger 
polypeptides, the native a-factor leader sequence has been truncated to leave one or 
two of the a-factor units between the leader peptide and the polypeptide. 

10 A number of secreted proteins are routed so as to be exposed to a proteolytic 
processing system which can cleave the peptide bond at the carboxy end of two 
consecutive basic amino acids. This enzymatic activity is in fL cerevisiae encoded by 
the KEX 2 gene (Julius, DA et al. f CeH 37 (1984b) 1075). Processing of the product 
by the KEX 2 protease is needed for the secretion of active SL cerevisiae mating factor 

15 a1 (MFa1 or a-factor) whereas KEX 2 is not involved in the secretion of active iL 
cerevisiae mating factor a. 

Secretion and correct processing of a polypeptide intended to be secreted is obtained 
in some cases when culturing a yeast organism which is transformed with a vector 

20 constructed as indicated in the references given above. In many cases, however, the 
level of secretion is very low or there is no secretion, or the proteolytic processing may 
be incorrect or incomplete resulting in secretion of a considerable amount of leader 
bound product polypeptide. Prosequences, and especially N-terminally located 
prosequences, or leader sequences expressed in eucaryotic cells, such as yeast cells, 

25 are extensively glycosylated, cf. Fiedler and Simons, Cell, 81, p 309-312; and Moir, 
D.T., Yeast mutants with increased secretion efficiency, in Yeast Genetic Engineering, 
Barr, P. J., Brake, A. J., and Valenzuela, P. eds,, wherein a general review of 
glycosylation and secretion of proteins is presented. It is generally recognised that 
glycosylation, which may be either N-linked, O-linked, or both, is important for efficient 

30 transport through the secretory pathway, cf. Caplan et al., Journal of Bacteriology, Vol. 
173, No.2, p. 627-635; and Jars et al M The Journal of Biological Chemistry, Vol. 270, 
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No. 42, p 24810-24817. Moreover, due to the extensive glycosylation the purification of 
secreted propeptides is difficult and differs considerably from the processing steps that 
are typically employed for the purification of the mature secreted polypeptide. 
Clements et al., Gene, 106 (1991) 267-272, have shown that using a eucaryotic 
5 consensus signal sequence and two 1 9-aa pro-sequences comprising fractions of the 
a-Factor leader and identical except for the presence or absence of a potential Asn 
linked (N-linked) glycosylation site for secretion of hEGF from yeast had no effect on 
secretion, and the level of secretion was comparable to the level obtained when using 
the a-Factor prepro-sequence (about 3p.g/ml). 

10 

Expression of heterologous proteins as fusion proteins is a well known concept and 
has been utilized in various contexts in different organisms. Secretory expression of a 
heterologous protein in yeast is often performed as a fusion protein with a secretion 
prepro-leader to confer secretion competence. Prepro-Ieaders tend to be 

15 hyperglycosylated or extensively O-linked glycosylated in the S. cerecisiae secretory 
pathway. Purification of hyperglycosylated fusion protein is laborious due to its 
heterogeneous nature. Efficient prepro-leaders lacking hyperglycosylation, with no or 
limited O-Iinked glycosylation and replacement of the dibasic Kex2 endoprotease site 
with a more convenient enzymatic processing site, provide an alternative to 

20 conventional yeast expression by purification of the fusion protein and subsequently in 
vitro maturation with a suitable enzyme as exemplified herein for the insulin precursor. 
In vitro maturation of a purified fusion protein is more flexible since dependency on the 
Kex2 endoprotease is eliminated and any proteolytic enzyme can be used for 
maturation provided that the heterologous protein does not have any internal 

25 processing sites. Purification of the fusion protein from the culture supernatant followed 
by in vitro maturation will avoid N-terminal processing of the heterologous protein by 
dipeptidyl aminopeptidase. Secretion of a fusion protein rather than the heterologous 
protein has the advantage that the propeptide may increase stability and solubility until 
purification and maturation. Secretory expression in yeast of heterologous proteins with 

30 internal dibasic sites may lead to Kex2 endoprotease processing and a decrease in 
fermentation yield. This can be avoided by utilizing a secretion prepro-leader lacking N- 
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linked glycosyiation to confer secretion competence, introduction of a suitable enzyme 
processing site between the prepro-leader and the heterologous protein, expression in 
a Kex2 endoprotease negative S. cerevisiae strain followed by purification and in vitro 
maturation. 

5 

It is an object of the present invention to provide novel synthetic leader peptides or pro- 
sequences which ensure a higher yield and a more efficient recovery and/or 
processing of polypeptides, preferably secreted polypeptides, including leader bound 
polypeptides, and polypeptides being fused N-terminally to peptide sequences 
10 including leader sequences and/or spacer sequences each of which optionally being 
separated from the other constituent sequences by a processing site, expressed in a 
eucaryotic host cell organism, preferably a fungal cell, such as a yeast cell or a 
filamentous fungus cell. 

15 

SUMMARY OF THE INVENTION 

A novel type of synthetic leader peptide has been found which allows secretion in high 
yield and/or improved recovery of a polypeptide produced in yeast. 

20 

Accordingly, the present invention relates to a DNA construct encoding a polypeptide 
and having the structure SP-LP-(PSHS)-(PS)-*gene*, wherein SP is a DNA 
sequence (presequence) encoding a signal peptide, LP is a DNA sequence 
encoding a synthetic leader peptide (propeptide) wherein N-Iinked glycosyiation is 

25 lacking, PS is a DNA sequence encoding a protease processing site which is 
optional, S is a DNA sequence encoding a spacer peptide, and *gene* is a DNA 
sequence encoding a polypeptide. The structure SP-LP-(PS)-(S)-(PS)-*gene* 
comprises the following structures, SP-LP-PS-S-PS-*gene*, SP-LP-PS-*gene*, SP- 
LP-PS-S-*gene*, SP-LP-S-*gene*. SP-LP-S-PS-*gene*, and SP-LP-*gene*; in 

30 structures containing more than one PS these may be the same or different. 
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Preferably, PS is a DNA sequence encoding a yeast protease processing site, such 
as an endopeptidase processing site, and LS is preferably a DNA sequence encoding 
a synthetic leader peptide or prepro-leader with the general formula I: 



5 Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (!) 
wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the group 
consisting of T,L,A I V t D,P l H,N I S 1 G, and Y is a codable amino acid selected from the 
10 group consisting of Q and N; the C-terminal KR is an optional dibasic processing 
site. 



More preferably, LS is a DNA sequence encoding a synthetic leader peptide with the 
general formula II: 

15 QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q 
VNLI(A/D)MAKR (II) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine (S), 
or aspartic acid (D); the C-terminal KR is an optional dibasic processing site, 
or LS is a DNA sequence encoding a synthetic leader peptide with the general formula 
20 III: 

QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(A/D)PLALDV 
VNLI(A/D)MA (III) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine (S), 
or aspartic acid (D). In formulas I and II above, the C-terminal amino acids KR define a 
25 yeast processing site which is optional. 

In the present context, the expression "leader peptide" is understood to indicate a pro- 
peptide sequence whose function is to allow the expressed polypeptide product of 
*gene* optionally fused at its N-terminal to a spacer peptide and/or a sequence of one 
30 or more amino acids defining a processing site, to be directed from the endoplasmic 
reticulum to the Golgi apparatus and further to a secretory vesicle for secretion into the 
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medium, (i.e. exportation of the expressed polypeptide across the cellular membrane 
and cell wall, if present, or at least through the cellular membrane into the periplasmic 
space of a cell having a cell wall). The term "synthetic" used in connection with leader 
peptides is intended to indicate that the leader peptide is one not found in nature, and, 

5 especially, the leader peptide sequences of the present invention do not include the a- 
factor leader sequence or fragments and constructs thereof such as the sequence 
QPVISTTVGSAAEGSLDKR, and a leader sequence derived from S. cerevisiae 
HSP150 protein having extensive O-linked glycosylation, cf. Simonen, M., Vihinen, H., 
Jamsa, E., Arumae, U., Kalkkinen, N. f and Makarow, M. (1996) The hspl 50D-carrier 

10 confers secretion competence to the rat nerve growth factor receptor ectodomain in 
Saccharomyces cerevisiae. Yeast 12, 457-466. Jamsa E ; Holkeri H ; Vihinen H ; 
Wikstrom M ; Simonen M ; Walse B ; Kalkkinen N ; Paakkola J ; and Makarow M 
(1995) Structural features of a polypeptide carrier promoting secretion of a beta- 
lactamase fusion protein in yeast. YEAST 1 1 ,1 381-91 . 

15 

The term "signal peptide" is understood to mean a pre-sequence which is 
predominantly hydrophobic in nature and present as an N-terminal sequence of the 
precursor form of an extracellular protein, preferably when expressed in yeast. The 
function of the signal peptide is to allow the expressed protein to be secreted to enter 
20 the endoplasmic reticulum. The signal peptide is normally cleaved off in the course of 
this process. The signal peptide may be heterologous or homologous to the organism 
producing the protein. 

The expression "polypeptide" is intended to indicate a heterologous polypeptide, i.e. a 
25 polypeptide or protein which is not produced by the host organism, preferably yeast, in 
nature as well as a homologous polypeptide, i.e. a polypeptide which is produced by 
the host organism, preferably a yeast, in nature and any preform thereof. In a preferred 
embodiment, the DNA construct of the present invention encodes a heterologous 
polypeptide. 



30 
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The expression "a codable amino acid" is intended to indicate an amino acid which can 
be coded for by a triplet ("codon") of nucleotides. 

When, in the amino acid sequences given in the present specification, the one or three 
5 letter codes of two amino acids, separated by a slash, are given in brackets, e.g. (D/E), 
this is intended to indicate that the sequence has either the one or the other of these 
amino acids in the pertinent position. 

The expression "heterologous protein" is intended to indicate a protein or polypeptide 
10 which is not produced by the host organism in nature, preferably the protein or 
polypeptide is heterologous in yeast. 

The expression "spacer peptide" is intended to indicate an oligopeptide sequence of 
one or more amino acid residues, preferably 1 to 12 amino acid residues, more 
15 preferably about 4 to 6 amino acid residues, such as EEAEPK, EEGEPK, 

E(EA)3EPK, and EEPK, which may include a processing site, preferably situated N- 
terminally and/or C-terminally. 



20 BRIEF DESCRIPTION OF THE DRAWINGS 



The present invention is further illustrated with reference to the appended drawings 
wherein 

Fig. 1 shows the expression plasmid pAK773 containing genes expressing the N- 
25 terminally extended polypeptides of the invention. In Fig. 1 the following 

symbols are used: TPI-PROMOTER: Denotes the TPI gene promoter 
sequence from S. cerevisiae. 2: Denotes the region encoding a signal/leader 
peptide (e.g. from the YAP3 signal peptide and LA19 leader peptide in 
conjunction with the EEGEPK N-terminally extended MI3 insulin precursor). 
30 TPI-TERMINATOR: Denotes TPI gene terminator sequence of S. cerevisiae. 

TPI-POMBE: Denotes TPI gene from S. pombe. Origin: Denotes a sequence 
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from S. cerevisiae 2 \x plasmid including its origin of DNA replication in S. 
cerevisiae. AMP-R: Sequence from pBR322 /pUC13 including the ampicillin 
resistance gene and an origin of DNA replication in E. coli. 

Fig. 2 shows an example of a DNA sequence pAK855 (SEQ ID No. 1 ) encoding the 
YAP3 signal peptide, a leader without potential N-iinked glycosylation sites, the 
TA57 leader, and EEGEPK-MI3 insulin precursor complex. 

Fig. 3 shows an example of a DNA sequence (SEQ ID No. 2) encoding the YAP3 

signal peptide, a leader without potential N-linked glycosylation sites, the 
leader TA67, and MI3 insulin precursor without N-terminally extension complex. 

Fig. 4 shows the expression plasmid pAK855 containing genes expressing the 
leader sequences of the invention. 

Fig. 5 shows in vitro conversion of LA34/IP fusion protein by Achromobacter lyticus 
lysyl specific protease as a plot of the conversion of LA34/IP fusion protein by 
Sepharose-bound Achromobacter lyticus lysyl specific protease vs. time. A 
curve for a first order reaction with (pseudo-)equilibrium is fitted to the data 
points. 

Fig. 6 shows mass spectrometry of in vitro maturation of purified LA34 prepro-ieader 
insulin precursor (MI3) fusion protein by Achromobacter lyticus lysyl specific 
endoprotease. 



DETAILED DISCLOSURE OF THE INVENTION 



Preferred leader sequences of the invention are shown in Table 1 below. 
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Table 1 



Strain 


leader 

IMU. 


Leader sequence 


SEQ ID 

hi 

No. 


yAK744 


LA23 


QPIDDTESQTTSVNLMADDTESRFATQTTLALDWNLISMAKR 


3 
\j 


yAK857 


TA54 


QPIDDTESQTTSVNLMADDTESRFATQTPLALDWNLISMAKR 


A 
*+ 


yAK858 


TA56 


QPIDDTESQTTSVNLMADDTESRFATNTNALDWNLISMAKR 




yAK862 


TA57 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLDWGLISMAKR 


u 


yAK861 


TA59 


QPIDDTESQTTSVNLMADDTESAFATQTTSVGGLDWGLISMAKR 


7 




LA64 


QPiDDTESQTTSVNLMADDTESRFATQTTLALDWNLPGAKR 


Q 
O 




TA65 


QP1DDTESQTTSVNLMADDTESAFATQTNSGGLDWGLPGAKR 


Q 




TA101 


QP1DDTESQTTSVNLMADDTESAFATQTNSGGLDWGLISMA 






TA67 


QP ID DTESG7TSVN LMAD DTESAFATQTTSVG G LDWR 1 PR AKR 


I I 




TA68 


QP 1 DDTESQTTSVN LMADDTESAFATOTPLALDWN L 1 SM AKR 


10 
1 ^ 




LA34 


QPIDDTESQTTSVNLMADDTESRFATQTTLALDWNLISMA 


13 




TA76 


QPIDDTESQTTSVNLMADDTESRFATQ7TLPGAKR 


14 




TA77 


QPIDDTESQTTSVNLMADDTESRALDWNLPGAKR 


15 




TA78 


QPIDDTESQTTSVNLMFATQTTLALDWNLPGAKR 


16 




TA79 


QPIDDTESQADDTESRFATQTTLALDWNLPGAKR 


17 




TA80 


QPTTSVNLMADDTESRFATQTTLALDWNLPGAKR 


18 




TA89 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLDWGNTTLISMAKR 


19 




TA90 


QP1DDTESQTTSVNLMADDTESAFATQTNSGGLDWGLINTTMAKR 


20 



In the sequences of Table 1 the C-terminal KR defines a dibasic protease processing 
site. 

Further preferred leader sequences of the invention are shown in Table 1a and 1b 
below. 



Table 1a 



Leader 
No. 


Leader Sequence 


SEQ 
ID No. 


TA75 


QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(A/D)PLAL 
DWNLISMA 


21 


TA75.50 


QPIDDAEAQAAAVNLMADDDEGFAAQAPLALDWNLISMA 


22 


TA75.15 


QPIDDAEAQDDDVNLMADDDGRFADQAPLALDWNLISMA 


23 


TA75.4 


QPIDDAEAQDAAVNLMADDGRLKIRFAAQAPLALDWNLISMA 


24 
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TA75.51 


QPIDDAEDQAAAVNLMADDDEDGFAAGAPLALDWNL1SMA 


25 


TA75.58 


QPIDDAEAQDDDVNLMADDDGRFAAQAPLALDWNLISMA 


26 


TA75.64 


QPIDDAEAQDDDVNLMADDDGRFAAQAPLALDWNLISViA 


27 




and any of the above where SMA is replaced by X'MA, wherein X 1 may be any 
codabie amino acid, preferably hydrophiiic amino acids 


28 



Table 1b 



Leader No. 


Leader Sequence 


SEQ ID No. 


TA91 


QPTTSVNLMADDTESAFATQTNSGGLDWGLISMAKR 


29 


TA92 


QPIDDTESQADDTESAFATQTNSGGLDWGLISMAKR 


30 


TA93 


QPIDDTESQTTSVNLMFATQTNSGGLDWGLISMAKR 


31 


TA94 


QPIDDTESGTTSVNLMADDTESAGGLDWGLISMAKR 


32 


TA95 


QPiDDTESGTTSVNLMADDTESAFATQTNSLISMAKR 


33 


TA96 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLMAKR 


34 


TA97 


QPIDDTESGTTSVNLMADDTESALISMAKR 


35 


TA98 


QPIDDTESQTTSVNLMLISMAKR 


36 



The heterologous protein or polypeptide produced by the method of the invention 
may be any protein which may advantageously be produced in yeast. Preferred 

10 examples of such proteins are aprotinin, tissue factor pathway inhibitor or other 
protease inhibitors, and insulin or insulin precursors, insulin analogues, insulin-like 
growth factors, such as IGF I and IGF II, human or bovine growth hormone, 
interleukin, tissue plasminogen activator, glucagon, glucagon-like peptide-1 (GLP 1), 
glucagon-like peptide-2 (GLP 2), GRPP, Factor VII, Factor VIII, Factor XIII, platelet- 

15 derived growth factor, enzymes, such as lipases, or a functional analogue of any one 
of these proteins. More preferred proteins are precursors of insulin and insulin-like 
growth factors, and especially the smaller peptides of the proglucagon family, such 
as glucagon, GLP 1, GLP 2, and GRPP, including truncated forms, such as GLP-1(1- 
45), GLP-1(1-39), GLP-1(1-38), GLP-1(1-37), GLP-1(1-36), GLP-1(1-35). GLP-1(1-34), 
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GLP~l(7-45), GLP-1(7-39), GLP-1(7-38), GLP-1(7-37), GLP-1 (7-36), GLP-1 (7-35), and 
GLP-1 (7-34). 

!n the present context, the term "functional analogue" is meant to indicate a 
5 polypeptide with a similar function as the native protein (this is intended to be 
understood as relating to the nature rather than the level of biological activity of the 
native protein). The polypeptide may be structurally similar to the native protein and 
may be derived from the native protein by addition of one or more amino acids to 
either or both the C- and N-terminal end of the native protein, substitution of one or 
10 more amino acids at one or a number of different sites in the native amino acid 
sequence, deletion of one or more amino acids at either or both ends of the native 
protein or at one or several sites in the amino acid sequence, or insertion of one or 
more amino acids at one or more sites in the native amino acid sequence. Such 
modifications are well known for several of the proteins mentioned above. 

15 

The precursors of insulin, including proinsulin as well as precursors having a truncated 
and/or modified C-peptide or completely lacking a C-peptide, precursors of insulin 
analogues, and insulin related peptides, such as insulin-like growth factors, may be of 
human origin or from other animals and recombinant or semisynthetic sources. The 
20 cDNA used for expression of the precursors of insulin, precursors of insulin analogues, 
or insulin related peptides in the method of the invention include codon optimised 
forms for expression in yeast. 

By "a precursor of insulin" or "a precursor an insulin analogue" is to be understood a 
25 single-chain polypeptide which by one or more subsequent chemical and/or 
enzymatical processes can be converted to a two-chain insulin or insulin analogue 
molecule having the correct establishment of the three disulphide bridges as found in 
natural human insulin. Preferred insulin precursors are MI1, B(1-29)-A(1-21); MI3, 
B(1-29)-Ala-Ala-Lys-A(1-21) (as described in e.g. EP 163 529); X14, B(1-27-Asp- 
30 Lys)-AIa-Ala-Lys-A(1-21) (as described in e.g. PCT publication No. 95/00550); B(1- 
27-Asp-Lys)-A(1-21); B(1-27-Asp-Lys)-Ser-Asp-Asp-Ala-Lys-A(1-21); B(1-29)-Aia- 
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AIa-Arg-A(1-21) (as described in e.g. PCT Publication No. 95/07931); MIS, B(1-29)- 
Ser-Asp-Asp-A!a-Lys-A(1-21); and B(1-29)-Ser-Asp-Asp-A!a-Arg-A(1-21) ( and more 
preferably MM, B(1-29)-A(1-21), MI3, B(1-29)-Ala-Ala-Lys-A(1-21) and MIS, B(1-29)- 
Ser-Asp-Asp-AIa-Lys-A(1 -21 ). 

5 

Examples of insulins or insulin analogues which can be produced in this way are 
human insulin, preferably des(B30) human insulin, porcine insulin; and insulin 
analogues wherein at least one Lys or Arg is present preferably insulin analogues 
wherein Phe B1 has been deleted, insulin analogues wherein the A-chain and/or the B- 

10 chain have an N-terminal extension and insulin analogues wherein the A-chain and/or 
the B-chain have a C-terminal extension. Other preferred insulin analogues are such 
wherein one or more of the amino acid residues, preferably one, two, or three of them, 
have been substituted by another codable amino acid residue. Thus, in position A21 a 
parent insulin may instead of Asn have an amino acid residue selected from the group 

15 comprising Ala, Gin, Glu, Gly, His, lie, Leu, Met, Ser, Thr, Trp, Tyr or Val, in particular 
an amino acid residue selected from the group comprising Gly, Ala, Ser, and Thr. The 
insulin analogues may also be modified by a combination of the changes outlined 
above. Likewise, in position B28 a parent insulin may instead of Pro have an amino 
acid residue selected from the group comprising Asp and Lys, preferably Asp, and in 

20 position B29 a parent insulin may instead of Lys have the amino acid Pro. The 
expression "a codabie amino acid residue" as used herein designates an amino acid 
residue which can be coded for by the genetic code, i. e. a triplet ("codon") of 
nucleotides. 



25 The signal sequence (SP) may encode any signal peptide which ensures an effective 
direction of the expressed polypeptide into the secretory pathway of the cell. The signal 
peptide may be a naturally occurring signal peptide or functional parts thereof or it may 
be a synthetic peptide. Suitable signal peptides have been found to be the a-factor 
signal peptide, the signal peptide of mouse salivary amyiase, a modified 

30 carboxypeptidase signal peptide, the yeast BAR1 signal peptide or the Humicola 
lanuginosa lipase signal peptide or a derivative thereof. The mouse salivary amylase 
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signal sequence is described by Hagenbuchle, O. et al. f Nature 289 (1981) 643-646. 
The carboxypeptidase signal sequence is described by Vails, LA et aL, Cell 48 (1987) 
887-897. The BAR1 signal peptide is disclosed in WO 87/02670. The yeast aspartic 
protease 3 signal peptide is described in Danish patent application No. 0828/93. 

5 

The yeast processing site encoded by the DNA sequence PS may suitably be any 
paired combination of Lys and Arg, such as LysArg, ArgLys, ArgArg or LysLys which 
permits processing of the polypeptide by the KEX2 protease of Saccharomvceft 
cerevisiae or the equivalent protease in other yeast species (Julius, D.A. et al M Cell 3Z 
10 (1984) 1075). If KEX2 processing is not convenient, e.g. if it would lead to cleavage of 
the polypeptide product, e.g. due to the presence of two consecutive basic amino acids 
internally in the desired product, a processing site for another protease may be 
selected comprising an amino acid combination which is not found in the polypeptide 
product, e.g. the processing site for FX a , HeGIuGlyArg (cf. Sambrook, J., Fritsch, E.F. 

15 and Maniatis, T., Molecular Cloning: A Laboratory Manual , Cold Spring Harbor 
Laboratory Press, New York, 1989). 

Two of the preferred DNA constructs encoding leader sequences are incorporated in 
SEQ ID Nos. 1 and 2 as shown in Fig. 2 codon 1078-1209, and Fig. 3 codon 1028- 

20 1206, or suitable modifications thereof. Examples of suitable modifications of the DNA 
sequence are nucleotide substitutions which do not give rise to another amino acid 
sequence of the protein, but which may correspond to the codon usage of the 
organism, preferably a fungal organism, such as a yeast, into which the DNA construct 
is inserted or nucleotide substitutions which do give rise to a different amino acid 

25 sequence and therefore, possibly, a different protein structure. Other examples of 
possible modifications are insertion of one or more codons into the sequence, addition 
of one or more codons at either end of the sequence and deletion of one or more 
codons at either end of or within the sequence. 

30 One aspect of the invention is a recombinant expression vector carrying any one of the 
expression casettes 
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5'-P-SP4^PSMSMPS)-*gene*-(T)j-3' 

S'-P-SP-LP-PS-'gene'-CQj-S 

S'-P-SP-LP-S-PS-'gene'-OT)^* 

S'-P-SP-LP-PS-S^geneHDj-S' 

5 5'-P-SP-LP-S-*gene*-(T)!-3' 

S'-P-SP-LP-'gene^COj-S' 

S'-P-SP-LP-PS-S-PS-'gene'-COj-S' 

wherein P is a promoter sequence, SP, LP, PS, S, and *gene*, are as defined above, 
10 T is a suitable terminator, e.g. the TPI terminator (cf. Aiber, T. and Kawasaki, G., J, 
Mol. Appl. Genet 1 (1982) 419-434), and i is 1 or 0. The vector may be any vector 
which is capable of replicating in yeast organisms. The promoter may be any DNA 
sequence which shows transcriptional activity in yeast and may be derived from genes 
encoding proteins either homologous or heterologous to yeast. The promoter is 
15 preferably derived from a gene encoding a protein homologous to yeast. Examples of 
suitable promoters for use in yeast host cells are the Saccharomyces cerevisiae 
MFa1, TPI, ADH, PGK promoters, or the yeast plasmid 2m replication genes REP 1-3 
and origin of replication. The vector may also comprise a selectable marker, e.g. the 
Schizosaccharomyces pombe TPI gene as described by Russell, P.R., Gene 4Q 
20 (1985)125-130. 



The expression vector of the invention may be any expression vector that is 
conveniently subjected to recombinant DNA procedures, and the choice of vector will 
often depend on the host cell into which the vector is to be introduced. Thus, the 
25 vector may be an autonomously replicating vector, i.e. a vector which exists as an 
extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g. a plasmid. Alternatively, the vector may be one which, when 
introduced into a host cell, is integrated into the host cell genome and replicated 
together with the chromosome(s) into which it has been integrated. 
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The methods used to ligate the sequence S-P-SP-LS-PS^gene^T)^ 1 and to insert it 

into suitable yeast vectors containing the information necessary for yeast replication, 
are well known to persons skilled in the art (cf., for instance, Sambrook, J., Fritsch, E.R 
and Maniatis, T M op.cit. ). It will be understood that the vector may be constructed either 
5 by first preparing a DNA construct containing the entire sequence 5-P-SP-LS-PS- 
*gene*-(T)j-3' and subsequently inserting this fragment into a suitable expression 

vector, or by sequentially inserting DNA fragments into a suitable vector containing 
genetic information for the individual elements (such as the promoter sequence, the 

signal peptide, the leader sequence GInProIle(Asp/GIu)(Asp/Glu)X 1 (GIu/Asp)X 2 

10 AsnZ(Thr/Ser)X^, the processing site, the polypeptide, and, if present, the terminator 
sequence) followed by ligation. 

In a further aspect, the present invention relates to a process for producing a 
polypeptide (or protein) in yeast, the process comprising culturing a yeast cell, which is 

15 capable of expressing said polypeptide and which is transformed with a yeast 
expression vector as described above including a leader peptide sequence of the 
invention, in a suitable medium to obtain expression and secretion of the said 
polypeptide, after which the polypeptide is recovered from the medium. The term 
"culturing" includes fermenting a yeast under laboratory and industrial conditions to 

20 produce the polypeptide of interest. 

Yeasts are fungi of the class Ascomycetes, subclass Hemiascomycetidae. The yeast 
organism used in the method of the invention may be any suitable yeast organism 
which, on cultivation, produces large amounts of the desired polypeptide. Examples of 
25 suitable yeast organisms may be strains of the yeast species Saccharomyces 

cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum, Schizosaccharomyces 
pombe, Kluyveromyces lactis, Hansenula polymorpha, Pichia pastoris, Pichia 
methanolica, Pichia kluyveri, Yarrowia lipolytics, Candida sp., Candida utilis, Candida 
cacaoi, Geotrichum sp„ and Geotrichum fermentans. It is considered obvious for the 
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skilled person in the art to select any other fungal cell, such as cells of the genus 
Aspergillus, as the host organism. 

The transformation of the yeast cells may for instance be effected by protoplast 
5 formation followed by transformation in a manner known per se . The medium used to 
cultivate the cells may be any conventional medium suitable for growing yeast 
organisms. The secreted polypeptide, a significant proportion of which will be present 
in the medium in correctly processed form, may be recovered from the medium by 
conventional procedures including separating the yeast cells from the medium by 
10 centrifugation or filtration, precipitating the proteinaceous components of the 
supernatant or filtrate by means of a salt, e.g. ammonium sulphate, followed by 
purification by a variety of chromatographic procedures, e.g. ion exchange 
chromatography, affinity chromatography or the like. 

15 The invention is further described in the following examples which are not to be 
construed as limiting the scope of the invention as claimed. 



EXAMPLES 

20 

Construction of the yeast strain expressing the insulin precursor mediated by leaders 
lacking N-linked glycosylation. 



Synthetic genes coding for the leaders without amino acid sequences potential 
25 subjected to attachment of N-linked glycosylation in context with the insulin precursor 
with or without N-terminal extension of N-temninally extention was constructed using 
the Polymerase Chain Reaction (PGR). Oligonucleotides for PGR were synthesised 
using an automatic DNA synthesizer (applied Biosystems model 380A) using 
phosphoramidite chemistry and commercially available reagents (Beaucage, S.L and 
30 Caruthers, M.H., Tetrahedron letters 22 (1981) 1859-1869). The PGR was performed 
using the Pwo DNA or EHF Polymerase (Boehringer Mannheim GmbH, Sandhoefer 
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Strasse 116, Mannheim, Germany) according to the manufacture's instructions and the 
PCR mix was overlayed with 100 ul mineral oil (sigma Chemical CO, St. Louis MO, 
USA) 

5 PCR 

5 jjl! oligonucleotide (50 pmol) 
5 jil oligonucleotide (50 pmol) 
10 jJ 10X PCR buffer 
10 3^ldNTPmix 

0.5 pj Pwo or EHF enzyme 

0.5 jil pAK680 plasmid as template (0.2 ug DNA) 

71 ul dest. water 

15 A total of 12 cycles were performed, one cycle was 94 C for 45 sec; 40 C for 1 min; 72 
C for 1.5 min. The PCR mixture was then loaded onto an 2.5% agarose gel and 
electrophoresis was performed using standard techniques ( Sambrook J, Fritsch El 
and Maniatis T, Molecular cloning, Cold Spring Harbour Laboratory press, 1989). The 
resulting DNA fragment was cut out of the agarose gel and isolated by the Gene Clean 

20 kit (Bio 101 inc., PO BOX 2284, La Jolla, CA 92038, USA) according to the 
manufacturer's instructions. 

Certain leader DNA sequences were constructed by overlap PCR reaction as 
described by Horton, R.M, Cai, Z„ Ho, S.N. and Pease, L.R.: Gene splicing by overlap 
25 extension: talior-made genes using the polymerase chain reaction. Biotechniques 8 
(1990) 528-535. 

The purified PCR DNA fragment was dissolved in Des. water and restriction 
endonucleases buffer and typically cut with the restriction endonucleases Bglll and 
30 Ncol according to standard techniques (Sambrook J, Fritsch EF and Maniatis T, 
Molecular cloning, Cold Spring Harbour Laboratory press, 1989). The Ncol-Xbal DNA 
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fragment on 209 nucleotide basepars was subjected to agarose electrophoresis and 
purified using The Gene Clean Kit as described. 

The expression plasmid pAK721 or a similar piasmid of the cPOT type (see Fig. 1 ) was 
5 typically cut with the restriction endonucleases Bglll and Xbal and the vector fragment 
of 10849 nucleotide basepairs isolated using The Gene Clean Kit as described. 

The typically plasmid pAK773 encoding the N-terminally extended EEGEPK-insuiin 
precursor was cut with the restriction endonucleases Ncol and Xbal and the DNA 

10 fragment of 209 nucleotide basepars isolated using The Gene Clean Kit as described. 
The three DNA fragments was ligated together using T4 DNA ligase and standard 
conditions (Sambrook J, Fritsch EF and Maniatis T, Molecular cloning, Cold spring 
Harbour laboratory press, 1989). The ligation mix was then transformed into a 
competent E. coli strain (R-, M+) followed by selection with ampicillin resistance. 

15 Plasmid from the resulting E. coli was isolated using standard techniques (Sambrook J, 
Fritsch EL and Maniatis T, Molecular cloning, Cold spring Harbour laboratory press, 
1989), and checked for insert with appropriate restriction endonucleases i.e. Bglll, 
EcoRI, Nco I and Xbal. The selected plasmid was shown by DNA sequence analysis 
(Sequenase, U.S. Biochemical Corp., USA) to encode the DNA sequence for the 

20 leader-MI3 insulin precursor DNA and the DNA encoding the leader to be inserted 
before the DNA encoding the MI3 insulin precursor DNA. 

An example on a DNA sequence pAK855 (SEQ ID No. 1) encoding the YAP3 signal 
peptide - a leader without potential N-Iinked glycosylation sites, the TA57 leader, 
25 EEGEPK-MI3 insulin precursor complex are shown in Fig. 2. 

An example on a DNA sequence (SEQ ID No. 2) encoding the YAP3 signal peptide- 
synthetic leader without potential N-linked glycosylation sites, the TA69 leader, MI3 
insulin precursor without N-terminally extension complex are shown in Fig. 3. 

30 
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The yeast expression plasmids used are of the C-POT type (see Fig. 1 and 4) and are 
similar to those described in WO EP 171 142, which contain the Schizosaccharomyces 
pombe triose phosphate isomerase gene (POT) for plasmid selection and stabilisation 
in S.cerevisiae. pAK855 also contain the S. cerevisiae triose phosphate isomerase 
5 promoter and terminator. The promoter and terminator are similar to those described in 
the plasmid pKFN1003 (described in WO 90/100075) as are all sequences in plasmid 
except the sequence between the EcoRI-Xbal fragment encoding the YAP3 signal 
peptide-leader without N-linked glycosylation-MI3 insulin precursor with or without N- 
terminally extension. 

10 

Purified LA34/IP fusion protein was processed by Sepharose-bound Achromobacter 
lyticus iysyl specific protease (EC 3.4.21.50) to insulin desB30 (Fig. 5, Fig. 6). From the 
RP-HPLC analysis results the conversion yield for the removal of the LA34 leader from 
IP molecule in each collected sample was calculated and then plotted in a graph 

15 showing the conversion as a function of the reaction time. A curve for a first-order 
reaction reaching a (pseudo-)equilibrium can be fitted to the data points as shown in 
Fig. 5, Fig, 6. Electrospray mass spectrometry was performed on the proteinaceous 
material isolated from the two main peaks eluted by the RP-HPLC fractionation of the 
final reaction mixture. For the first eluting peak was found Mw of 5706 Da, 

20 corresponding to des(B30)-human insulin (calculated Mw: 5706 Da), and for the 
second peak was found a Mw of 5625 Da, corresponding to the di-mannosylated 
LA34-EE AE AE AE P K polypeptide lacking the dipeptide QP (calculated Mw: 5627 Da) 
the QP dipeptide presumably having been removed by the dipeptidyl aminopeptidase 
during secretion. This means that within the reaction time an almost complete cleavage 

25 of the precursor to an active desB30 insulin molecule has taken place. 
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CLAIMS 

1.A DNA construct encoding a polypeptide and having the structure 
SP-LP-(PSMSMPS)-*gene*, 

5 wherein SP is a DNA sequence (presequence) encoding a signal peptide, LP is a 
DNA sequence encoding a synthetic leader peptide (propeptide) wherein N-linked 
glycosylation is lacking, PS is a DNA sequence encoding a protease processing 
site which is optional in both positions, S is a DNA sequence encoding a spacer 
peptide which is optional, and *gene* is a DNA sequence encoding a polypeptide. 

10 2. A DNA construct according to claim 1 f and having the structure 
SP-LP-PS-*gene*, 

wherein SP, LP, PS, and *gene* have the meanings defined above. 

3. A DNA construct according to claim 2, which furthermore comprises a sequence 
encoding a spacer peptide located at the 5' end of *gene* and optionally 

15 comprises a sequence encoding a protease processing site located between the 
3' end of the sequence encoding said spacer peptide and the 5* end of said 
*gene* 

4. A DNA construct according to any one of the preceding claims which is 
furthermore characterised in that O-linked glycosylation of LP is lacking. 

20 5. A DNA construct according to any one of claims 1, 2 and 3 which is furthermore 

characterised in LP having O-linked glycosylation. 
6. A DNA construct according to any one of the preceding claims, characterised in 

that LP does not comprise the consensus N-linked glycosylation sites NXT/S, 

wherein X designates any codable amino acid. 
25 7. A DNA construct according to any one of the preceding claims, wherein SP is a 

DNA sequence selected from the group of DNA sequences encoding the S. 

cerevisiae a-factor signal peptide, the signal peptide of mouse salivary amylase, 

the yeast carboxypeptidase signal peptide, the yeast aspartic protease 3 signal 

peptide or the yeast BAR1 signal peptide. 
30 8. A DNA construct according to any one of the preceding claims, wherein LP is a 

DNA sequence encoding a leader peptide with the general formula I: 



WO 98/32867 PCT/DK98/00026 

22 

Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(!SMA)/(PGA)KR (I) 
wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
5 group consisting of T.LAVAP.H.N.S.G, and Y is a codable amino acid selected 
from the group consisting of Q and N. 

9. A DNA construct according to claim 8, wherein Y is Q and X does not comprise S 
or T. 

10. A DNA construct according to any one of claims 1 to 7, wherein LS is a DNA 
10 sequence encoding a synthetic leader peptide with the general formula II: 

QPIDD(A/D)E(A/D)Q(A/D)(A/D)( 

WNLI(A/D)MAKR (II) 
wherein (AID) can be any codable amino acid, but preferably is alanine (A) or 
aspartic acid (D). 

15 11. A DNA construct according to any one of claims 1 to 7, wherein LS is a DNA 
sequence encoding a synthetic leader peptide with the general formula III: 

QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(A/D)PLAL 
DWNLI(A/D)MA(III) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
20 (S), or aspartic acid (D). 

12. A DNA construct according to any one of the preceding claims, wherein X is 
selected from the sequences NA, TLA, DLA, PLA, TLAGG, TLADGG, TLADD, 
TLAGD, NSGG, TNSGG, and TSVGG. 

13. A DNA construct according to any one of the preceding claims, wherein the 
25 leader peptide coded for by the DNA sequence LP is selected from the group 

comprising the sequences LA23, TA54, TA56, TA57, TA59, LA64, TABS, TA67, 
TA68, TA76, TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 
herein. 

14. A DNA construct according to any one of the preceding claims, wherein the 
30 leader peptide coded for by the DNA sequence LP is selected from the group 
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comprising the sequences TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, 
and TA75.64 of Table 1 a herein. 

15. A DNA construct according to any one of the preceding claims, wherein the 
leader peptide coded for by the DNA sequence LP is selected from the group 

5 comprising the sequences TA91, TA92, TA93, TA94, TA95, TA96, TA97, and 
TA98, of Table 1b herein. 

16. A DNA construct according to any one of the preceding claims, wherein PS is a 
DNA sequence encoding an endoprotease processing site which allows in vivo 
processing. 

10 17. A DNA construct according to the preceding claim wherein the processing site is 
selected from DNA sequences encoding a dibasic processing site, preferably 
encoding the amino acid sequences KR, RK, RR, or KK. 

18. A DNA construct according to any one of the preceding claims, wherein PS is a 
DNA sequence encoding an endoprotease processing site which allows in vitro 

15 processing. 

19. A DNA construct according to the preceding claim wherein the processing site is 
selected from DNA sequences encoding a monobasic or dibasic processing site, 
preferably encoding the amino acid sequences K, R, or KR, RK, RR, or KK. 

20. A DNA construct according to any one of the preceding claims, wherein the 
20 polypeptide is a polypeptide which is heterologous to yeast. 

21 . A DNA construct according to the preceding claim, wherein the polypeptide is 
selected from the group consisting of aprotinin, tissue factor pathway inhibitor, or 
other protease inhibitors, insulin or insulin precursors, insulin-like polypeptides, 
such as insulin-like growth factor I and insulin-like growth factor II, human or 

25 bovine growth hormone, interleukin, glucagon, glucagon-like peptide 1 , glucagon- 
like peptide II, GRPP, tissue plasminogen activator, transforming growth factor a 
or b, platelet-derived growth factor, enzymes, or a functional analogue thereof. 

22. A DNA construct according to claim 18, wherein the polypeptide is selected from 
the group consisting of insulin or insulin precursors, insulin-like polypeptides, such 

30 as insulin-like growth factor I and insulin-like growth factor II, glucagon, glucagon- 
like peptide 1 , glucagon-like peptide II, GRPP, or a functional analogue thereof. 
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23. A DNA construct according to any one of claims 1 to 17, wherein the polypeptide 
is a polypeptide which is homologous to yeast. 

24. A DNA construct according to the preceding claim, wherein the polypeptide is 
selected from the group consisting of the gene products of the KEX2 gene, and 
the YAP3 gene. 

25. A DNA construct according to any one of the preceding claims which furthermore 
comprises a promoter sequence located at the N-terminal end of the structure SP- 
LP-PS-*gene*. 

26. A DNA construct according to any one of the preceding claims which furthermore 
comprises a promoter sequence located at the N-terminal end of the structure SP- 
LP-(PS)-(S)-(PSKgene*. 

27. A DNA construct according to claim 25 and 26, wherein the promoter sequence 
is a yeast promoter sequence, preferably the TPI promoter. 

28. An expression cassette comprising the DNA construct according to claim 25, 
which additionally comprises a 5' terminally located promoter sequence and 
a terminator sequence (T\ located at the 3' terminal of the structure SP-LP-PS- 
*gene*, where i is 0 or 1 . 

29. An expression cassette comprising the DNA construct according to claim 26, 
which additionally comprises a 5' terminally located promoter sequence and 
a terminator sequence (T)| located at the 3' terminal of the structure SP-LP-(PS)- 

(S)-(PS)-*gene*, where i is 0 or 1. 

30. An expression cassette according to claims 28 and 29, wherein i is 1 and T is a 
DNA sequence encoding the TPI terminator. 

31. A yeast expression vector comprising the DNA construct according to any of the 
preceding claims. 

32. A yeast cell which is capable of expressing a polypeptide and which is 
transformed with a yeast expression vector according to claim 31 . 

33. A yeast cell according to claim 32 selected from the group consisting of 
Saccharomyces cerevisiae, Saccharomyces uvae, Saccharomyces kluyveri, 
Schizosaccharomyces pombe, Sacchoromyces uvarum, Kluyveromyces lactis, 
Hansenula polymorpha, Pichia pastoris, Pichia methanolica, Pichia kluyveri, 
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Yarrowia lipolytica, Candida sp., Candida utiiis, Candida cacaoi, Geotrichum sp., 
and Geotrichum fermentans. 

34. A process for producing a polypeptide in yeast, the process comprising culturing 
a yeast cell, which is capable of expressing the desired polypeptide and which is 

5 transformed with a yeast expression vector according to claim 31, in a suitable 
medium to obtain expression and secretion of the polypeptide, after which the 
polypeptide is recovered from the medium. 

35. A process according to the preceding claim, wherein the yeast cell is selected 
from the group consisting of S. cerevisiae, Saccharomyces uvae, Saccharomyces 

10 kluyveri, Schizosaccharomyces pombe, Sacchoromyces uvarum, Kluyveromyces 
lactis, Hansenula polymorpha, Pichia pastoris, Pichia methanolica, Pichia kluyveri, 
Yarrowia lipolytica, Candida sp., Candida utiiis, Candida cacaoi, Geotrichum sp M 
and Geotrichum fermentans, preferably Saccharomyces cerevisiae. 

36. A DNA sequence encoding a synthetic prepro-leader peptide Sacking the 
15 consensus N-linked glycosylation sites NX17S, wherein X designates any codable 

amino acid which is not P. 

37. A DNA sequence according to the preceding claim selected from the group 
consisting of 
Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 

20 wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
group consisting of T.LAVAP.H.N.S.G, and Y is a codable amino acid selected 
from the group consisting of Q and N, and wherein the C-terminal KR is an 
25 optional processing site. 

38. A DNA sequence according to the preceding claim selected from the group 
consisting of LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, TA68, TA76, 
TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 herein. 

39. A DNA sequence according to claim 36 selected from the group consisting of 
30 QPiDD(A/D)E(A/D)Q(A/D)(A/DX 

WNLI(A/D)MAKR (II) 
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wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S) ( or aspartic acid (D), and wherein the C-termina! KR is an optional processing 
site. 

40. A DNA sequence according to the preceding claim selected from the group 
5 consisting of TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, and TA75.64 

of Tablela herein. 

41 . A DNA sequence according to claim 36 selected from the group consisting of 
QPIDD(A/D)E(A/D)Q(A/D)(A/D)(^ 

DWNU(A/D)MA (III) 
10 wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S), or aspartic acid (D). 

42. A DNA sequence according to the preceding claim selected from the group 
consisting of TA91, TA92, TA93, TA94, TA95, TA96, TA97, and TA98, of Table 
1b herein. 

15 43. A synthetic prepro-Ieader peptide lacking the consensus N-linked giycosylation 
sites NXT/S, wherein X designates any codable amino acid which is not P. 

44. A synthetic prepro-Ieader peptide according to the preceding claim selected from 
the group consisting of 

Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 
20 wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
group consisting of T t L l A.V l D f P,H 1 N t S I G I and Y is a codable amino acid selected 
from the group consisting of Q and N, and wherein the C-terminal KR is an 
25 optional processing site. 

45. A synthetic prepro-Ieader peptide according to the preceding claim selected from 
the group consisting of LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, 
TA68, TA76, TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 
herein. 

30 46. A synthetic prepro-Ieader peptide according to claim 36 selected from the group 
consisting of 
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QPIDD(A/D)E(A/D)Q(A/D^ 
WNLI(A/D)MAKR (II) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S), or aspartic acid (D) r and wherein the C-terminal KR is an optional processing 
5 site. 

47. A a synthetic prepro-Ieader peptide according to the preceding claim selected 
from the group consisting of TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, 
and TA75.64 of Tablela herein. 

48. A synthetic prepro-Ieader peptide according to claim 36 selected from the group 
10 consisting of 

QPlDD(A/D)E(A/D)Q(A/D)(A/^ 

DVVNLI(A/D)MA (III) 
wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S), or aspartic acid (D). 
15 49. A synthetic prepro-Ieader peptide according to the preceding claim selected from 
the group consisting of TA91, TA92, TA93, TA94, TA95, TA96 t TA97, and TA98, 
of Table 1b herein. 

50. The use of a first DNA sequence encoding a synthetic prepro-Ieader lacking re- 
linked glycosylation sites for secretion of a protein in fungal cells, such as yeast 

20 cells. 

51. Use according to the preceding claim wherein said prepro-Ieader additionally 
lacks O-Iinked glycosylation sites. 

52. Use according to any of claims 36 to 37, wherein said synthetic prepro-Ieader 
has an amino acid sequence selected from the group consisting of 

25 Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 

wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
group consisting of T.LAVAP.H.N.SA and Y is a codable amino acid selected 
30 from the group consisting of Q and N, and wherein the C-terminal KR is an 
optional processing site. 
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53. Use according to the preceding claim wherein said prepro-leader is selected from 
the group consisting of LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, 
TA68, TA76, TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 
herein. 

5 54. Use according to any of claims 36 to 37, wherein said synthetic prepro-leader 
has an amino acid sequence selected from the group consisting of 
QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D 
WNLI(A/D)MAKR (||) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
10 (S), or aspartic acid (D), and wherein the C-terminal KR is an optional processing 
site. 

55. Use according to the preceding claim wherein said prepro-leader is selected from 

the group consisting of TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, and 

TA75.64 of Tablela herein. 

15 56. Use according to the preceding claim wherein said synthetic prepro-leader has 

an amino acid sequence selected from the group consisting of 

QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(A/D)PLAL 
DWNLI(A/D)MA (III) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
20 (S), or aspartic acid (D). 

57. Use according to the preceding claim wherein said synthetic prepro-leader has 
an amino acid sequence selected from the group consisting of TA91 , TA92, TA93, 
TA94, TA95, TA96, TA97, and TA98, of Table 1b herein. 

58. Use according to any of claims 36 to 43, wherein said protein is encoded by a 
25 second DNA sequence fused at the 5' end to said first DNA sequence encoding 

said prepro-leader. 

59. Use according to the preceding claim wherein a third DNA sequence encoding a 
spacer peptide optionally having one or more processing sites is inserted in frame 
between the 3' end of said first DNA sequence encoding said prepro-leader and 

30 the 5' end of said second DNA sequence encoding said protein. 



WO 98/32867 PCT/DK98/00026 

29 

60. Use according to the preceding claim wherein the DNA sequence encoding said 
spacer peptide is selected from DNA sequences encoding an oligopeptide having 
1 to 12 amino acid residues, such as EEAEPK, EEGEPK, E(EA) 3 EPK, EEPK. 

61. Use according to any of claims 36 to 46, wherein said protein is a heterologous 
protein. 

62. Use according to the preceding claim wherein said protein is selected from the 
group consisting of aprotinin, tissue factor pathway inhibitor, or other protease 
inhibitors, insulin or insulin precursors, insulin-like polypeptides, such as insulin- 
like growth factor I and insulin-like growth factor II, human or bovine growth 
hormone, interleukin, glucagon, glucagon-like peptide 1, glucagon-like peptide II, 
GRPP, tissue plasminogen activator, transforming growth factor a or b, platelet- 
derived growth factor, enzymes, or a functional analogue thereof. 

63. Use according to any of claims 36 to 48 wherein said protein is insulin or an 
insulin precursor. 

64. Use according to any of claims 36 to 46, wherein said protein is a homologous 
protein, preferably selected from the group consisting of the gene products of the 
yeast KEX2 and YAP3 genes. 

65. Use according to any of claims 36 to 50 wherein said yeast is selected from the 
group consisting of S. cerevisiae, Saccharomyces uvae, Saccharomyces kluyveri, 
Schizosaccharomyces pombe, Sacchoromyces uvarum, Kluyveromyces lactis, 
Hansenula polymorpha, Pichia pastoris, Pichia methanolica, Pichia kluyveri, 
Yarrowia lipolytica, Candida sp., Candida utilis, Candida cacaoi, Geotrichum sp„ 
and Geotrichum fermentans, preferably Saccharomyces cerevisiae. 
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Fig, 2 

EcoRI 



901 TTCTTGCTTA AATCTATAAC TACAAAAAAC ACATACAGGA ATTCCATTCA 
AAGAACGAAT TTAGATATTG ATGTTTTTTG TGTATGTCCT TAAGGTAAGT 

951 AGAATAGTTC AAACAAGAAG ATTACAAACT ATCAATTTCA TACACAATAT 
TCTTATCAAG TTTGTTCTTC TAATGTTTGA TAGTTAAAGT ATGTGTTATA 

+1 MKLKTVRSAVLS 

Bglll 

1001 AAACGATTAA AAGAATGAAA CTGAAAACTG TAAGATCTGC GGTCCTTTCG 
TTTGCTAATT TTCTTACTTT GACTTTTGAC ATTCTAGACG CCAGGAAAGC 

+1SLFA SQV LGQ PIDD TES 

Sty I 



1051 TCACTCTTTG CATCTCAGGT CCTTGGCCAA CCAATTGACG ACACTGAATC 
AGTGAGAAAC GTAGAGTCCA GGAACCGGTT GGTTAACTGC TGTGACTTAG 

+1 QTT SVNL MAD DTE S A F 
1101 TCAAACTACT TCTGTCAACT TGATGGCTGA CGACACTGAA TCTGCTTTCG 
AGTTTGATGA AGACAGTTGA ACTACCGACT GCTGTGACTT AGACGAAAGC 

+1ATQT KSG GLDV VGL ISM 

Styl 

Ncol 

1151 CTACTCAAAC TAACTCTGGT GGTTTGGATG TTGTTGGTTT GATCTCCATG 
GATGAGTTTG ATTGAGACCA CCAAACCTAC AACAACCAAA CTAGAGGTAC 

+1AK RE EGE PKF VNQH LCG 
Styl 

Ncol 

1201 GCTAAGAGAG AAGAAGGTGA ACCAAAGTTC GTTAACCAAC ACTTGTGCGG 
CGATTCTCTC TTCTTCCACT TGGTTTCAAG CAATTGGTTG TGAACACGCC 

+1 SHL VEAL Y L V CGE RGF 

Hindlll 



1251 TTCCCACTTG GTTGAAGCTT TGTACTTGGT TTGCGGTGAA AGAGGTTTCT 
AAGGGTGAAC CAACTTCGAA ACATGAACCA AACGCCACTT TCTCCAAAGA 

+1FYTP K A A KGIV EQC CTS 

Bsu36l 



1301 TCTACACTCC TAAGGCTGCT AAGGGTATTG TCGAACAATG CTGTACCTCC 
AGATGTGAGG ATTCCGACGA TTCCCATAAC AGCTTGTTAC GACATGGAGG 

+1ICSL Y Q L ENY C N * 
1351 ATCTGCTCCT TGTACCAATT GGAAAACTAC TGCAACTAGA CGCAGCCCGC 
TAGACGAGGA ACATGGTTAA CCTTTTGATG ACGTTGATCT GCGTCGGGCG 

Xbal 



1401 AGGCTCTAGA AACTAAGATT AATATAATTA TATAAAAATA TTATCTTCTT 
TCCGAGATCT TTGATTCTAA TTATATTAAT ATATTTTTAT AATAGAAGAA 
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Fig. 3 



EcoRI 



901 TTCTTGCTTA AATCTATAAC TACAAAAAAC ACATACAGGA ATTCCATTCA 
AAGAACGAAT TTAGATATTG ATGTTTTTTG TGTATGTCCT TAAGGTAAGT 

951 AGAATAGTTC AAACAAGAAG ATTACAAACT ATCAATTTCA TACACAATAT 
TCTTATCAAG TTTGTTCTTC TAATGTTTGA TAGTTAAAGT ATGTGTTATA 

+1 MKLKTVRSAVLS 

Eglll 



1001 AAACGATTAA AAGAATGAAA CTGAAAACTG TAAGATCTGC GGTCCTTTCG 
TTTGCTAATT TTCTTACTTT GACTTTTGAC ATTCTAGACG CCAGGAAAGC 

+1SLFA SQV LGQ PIDD TES 

sty I 



1051 TCACTCTTTG CATCTCAGGT CCTTGGCCAA CCAATTGACG ACACTGAATC 
AGTGAGAAAC GTAGAGTCCA GGAACCGGTT GGTTAACTGC TGTGACTTAG 

+1 QTT SVNL MAD DTE SAF 
1101 TCAAACTACT TCTGTCAACT TGATGGCTGA CGACACTGAA TCTGCTTTCG 
AGTTTGATGA AGACAGTTGA ACTACCGACT GCTGTGACTT AGACGAAAGC 

+1ATQT NSG GLDV VGL PGA 
1151 CTACTCAAAC TAACTCTGGT GGTTTGGATG TTGTTGGTTT GCCAGGTGCT 
GATGAGTTTG ATTGAGACCA CCAAACCTAC AACAACCAAA CGGTCCACGA 

+1KRFV NQH LCG SHLV EAL 

Hindi I I 



1201 AAGAGATTCG TTAACCAACA CTTGTGCGGT TCCCACTTGG TTGAAGCTTT 
TTCTCTAAGC AATTGGTTGT GAACACGCCA AGGGTGAACC AACTTCGAAA 

+1 Y L V CGER GFF YTP KAA 

Bsu36l 



1251 GTACTTGGTT TGCGGTGAAA GAGGTTTCTT CTACACTCCT AAGGCTGCTA 
CATGAACCAA ACGCCACTTT CTCCAAAGAA GATGTGAGGA TTCCGACGAT 

+1KGIV EQC CTSI CSL Y Q L 
1301 AGGGTATTGT CGAACAATGC TGTACCTCCA TCTGCTCCTT GTACCAATTG 
TCCCATAACA GCTTGTTACG ACAT GGAGGT AGACGAGGAA CATGGTTAAC 

+1 E H Y C N * 

Xbal 



1351 GAAAACTACT GCAACTAGAC GCAGCCCGCA GGCTCTAGAA ACTAAGATTA 
CTTTTGATGA CGTTGATCTG CGTCGGGCGT CCGAGATCTT TGATTCTAAT 
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Fig. 4 
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